Learning to Read L’Infinito: Handwritten Text Recognition with Synthetic Training Data
نویسندگان
چکیده
Deep learning-based approaches to Handwritten Text Recognition (HTR) have shown remarkable results on publicly available large datasets, both modern and historical. However, it is often the case that historical manuscripts are preserved in small collections, most of time with unique characteristics terms paper support, author handwriting style, language. State-of-the-art HTR struggle obtain good performance such manuscript for which few training samples available. In this paper, we focus datasets propose a new dataset, call Leopardi, typical consisting letters by poet Giacomo devise strategies deal data scarcity scenario. particular, explore use carefully designed but cost-effective synthetic pre-training models be applied single-author manuscripts. Extensive experiments validate suitability proposed approach, Leopardi dataset will favor further research direction.
منابع مشابه
Self-training for Handwritten Text Line Recognition
Off-line handwriting recognition deals with the task of automatically recognizing handwritten text from images, for example from scanned sheets of paper. Due to the tremendous variations of writing styles encountered between different individuals, this is a very challenging task. Traditionally, a recognition system is trained by using a large corpus of handwritten text that has to be transcribe...
متن کاملActive Learning for Historic Handwritten Text Recognition
This thesis examines the use of active learning for the task of handwritten text recognition in historical documents. Active learning is a machine learning paradigm which enables the learner to select the data that is being trained on. In domains where procuring annotated data is expensive but there are large amounts of unlabelled data available, active learning can lead to better models with t...
متن کاملUsing a Synthetic Character Database for Training Deep Learning Models Applied to Offline Handwritten Recognition
We present our current work on building a deep learning architecture for the offline handwritten character recognition problem. The proposed system is based on training a deep Convolutional Neural Network (CNN) to recognize handwritten characters, using a new synthetic character database derived from UNIPEN dataset. The presented approach is inspired in some successfully-used neural architectur...
متن کاملGenerating Synthetic Data for Text Recognition
Generating synthetic images is an art which emulates the natural process of image generation in a closest possible manner. In this work, we exploit such a framework for data generation in handwritten domain. We render synthetic data using open source fonts and incorporate data augmentation schemes. As part of this work, we release 9M synthetic handwritten word image corpus which could be useful...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Lecture Notes in Computer Science
سال: 2021
ISSN: ['1611-3349', '0302-9743']
DOI: https://doi.org/10.1007/978-3-030-89131-2_31